processor fetches four instructions each cycle, independent of their alignment in the instruction cache -- except that the processor cannot fetch across a 16-word cache block boundary. These words are then aligned in the 4-word Instruction register.
If any instructions were left from the previous decode cycle, they are merged with new words from the instruction cache to fill the Instruction register.